Quality Evaluation for Document Relation Discovery Using Citation Information
نویسندگان
چکیده
Assessment of discovered patterns is an important issue in the field of knowledge discovery. This paper presents an evaluation method that utilizes citation (reference) information to assess the quality of discovered document relations. With the concept of transitivity as direct/indirect citations, a series of evaluation criteria is introduced to define the validity of discovered relations. Two kinds of validity, called soft validity and hard validity, are proposed to express the quality of the discovered relations. For the purpose of impartial comparison, the expected validity is statistically estimated based on the generative probability of each relation pattern. The proposed evaluation is investigated using more than 10,000 documents obtained from a research publication database. With frequent itemset mining as a process to discover document relations, the proposed method was shown to be a powerful way to evaluate the relations in four aspects: soft/hard scoring, direct/indirect citation, relative quality over the expected value, and comparison to human judgment. key words: document relations; frequent itemset mining; citation matrix; quality evaluation; document relation evaluation
منابع مشابه
Improving MeSH classification of biomedical articles using citation contexts
Medical Subject Headings (MeSH) are used to index the majority of databases generated by the National Library of Medicine. Essentially, MeSH terms are designed to make information, such as scientific articles, more retrievable and assessable to users of systems such as PubMed. This paper proposes a novel method for automating the assignment of biomedical publications with MeSH terms that takes ...
متن کاملDesigning an Ontology for Knowledge Discovery in Iran’s Vaccine
Ontology is a requirement engineering product and the key to knowledge discovery. It includes the terminology to describe a set of facts, assumptions, and relations with which the detailed meanings of vocabularies among communities can be determined. This is a qualitative content analysis research. This study has made use of ontology for the first time to discover the knowledge of vaccine in Ir...
متن کاملکشف سرویسهای ابری در زبان فارسی از طریق تکامل هستانشناسی
Abstract The cloud computing is undoubtedly a great achievement of the computer networks. In this environment, various services have been provided but users should take the trouble to find the services they need. Although researchers have tried to solve the needs of users to information on the web, their studies enjoy strengths and weaknesses and there is no comprehensive system for the disc...
متن کاملخوداستنادی در آیینهی اخلاق
Self-citation is a behavior that is seen to varying degrees in researchers, research centers and medical journals. The question is whether self-citation is moral or not. This is a descriptive and analytical study (library and document research). Two main keywords (self-citation and ethics) were used for searching databases. In addition, efforts have been made for moral evaluation of self-citat...
متن کاملQuantitative and Qualitative Study of Scientific Productions in the Field of “Quran and Health”
Background and Purpose: The qualitative and quantitative study of scientific productions in different fields, as well as the review of the papers indexed in reliable citation indexes, has become a modern approach defined as scientometrics. The current research aimed to assess the position of scientific productions of researchers in the field of Quran and health in the Scopus database in terms o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- IEICE Transactions
دوره 90-D شماره
صفحات -
تاریخ انتشار 2007